Abstract: Distributed file systems are the key component of any cloud-scale data processing middleware. Evaluating the performance of DFSs is accordingly very important. In this paper, we propose a systematic and practical performance analysis framework, driven by architecture and design models for defining the structure and behaviour of typical master/slave DFSs. Our approach is different from others because 1) most of existing works rely on performance measurements under a variety of workloads/strategies, comparing with other DFSs or running application programs, but our approach is based on architecture and design level models and systematically derived performance models; 2) our approach is able to both qualitatively and quantitatively evaluate the performance of DFSs; and 3) our approach not only can evaluate the overall performance of a DFS but also its components and individual steps. We demonstrate the effectiveness of our approach by evaluating Hadoop distributed file system (HDFS).

Keywords: DFS, HDFS, performance, Map reduce, PVFS.